Citi Bike data analysis (2019)

citibike.jpg

Description

Below analysis answers many questions raised by citi bike community like Where do Citi Bikers ride? When do they ride? How far do they go? Which stations are most popular? Busiest bike?

Dataset: https://s3.amazonaws.com/tripdata/index.html

Technologies

  • Neo4j
  • Hadoop Map Reduce
  • Pig
  • Hive

Data analysis using Neo4j

- Full graph

graphfull.png

- Limit 5

neo1.png

Data analysis using Hadoop Mapreduce

In [15]:
import pandas as pd
import matplotlib
In [16]:
top_5_start_station = pd.read_csv("top5StartStation/part-r-00000", index_col="station_name", delim_whitespace=True, names =["station_name", "popularity"])
top_5_start_station
Out[16]:
popularity
station_name
Grove St PATH 20563
Hamilton Park 9537
Sip Ave 8339
Newport PATH 7350
Harborside 6989
In [17]:
top_5_start_station_graph = top_5_start_station.plot(kind='bar')
top_5_start_station_graph
Out[17]:
<matplotlib.axes._subplots.AxesSubplot at 0x2368127a278>
In [18]:
top_5_end_station = pd.read_csv("top5EndStation/part-r-00000", index_col="station_name", delim_whitespace=True, names =["station_name", "popularity"])
top_5_end_station
Out[18]:
popularity
station_name
Grove St PATH 24336
Hamilton Park 9280
Sip Ave 7571
Newport PATH 7134
Harborside 6947
In [19]:
top_5_end_station_graph = top_5_end_station.plot(kind='bar', color='#0099ff')
top_5_end_station_graph
Out[19]:
<matplotlib.axes._subplots.AxesSubplot at 0x236812d5550>
In [20]:
top_5_journey = pd.read_csv("top5Journey/part-r-00000", engine='python', index_col="journey", sep = "\"\s", names =["journey", "popularity"])
top_5_journey
Out[20]:
popularity
journey
"Hamilton Park"->"Grove St PATH 3432
"Grove St PATH"->"Hamilton Park 2695
"Brunswick & 6th"->"Grove St PATH 2162
"Jersey & 6th St"->"Grove St PATH 2012
"Brunswick St"->"Grove St PATH 1990
In [21]:
top_5_journey_graph = top_5_journey.plot(kind='bar', color='#0099ff')
top_5_journey_graph
Out[21]:
<matplotlib.axes._subplots.AxesSubplot at 0x23681332048>

- Bike ride distribution according to age

In [22]:
ride_dist_on_age = pd.read_csv("RideDistributionOnAge/part-r-00000", index_col="age", delim_whitespace=True , names =["age", "count"])
ride_dist_on_age.head()
Out[22]:
count
age
101 1
115 6
119 14
130 4
131 11
In [23]:
# ride_dist_on_age.plot.pie(subplots=True, figsize=(30,30))
ride_dist_on_age.plot.barh(figsize=(25,25),color='#0099ff')
Out[23]:
<matplotlib.axes._subplots.AxesSubplot at 0x2368138f3c8>

- Bike ride distribution based on day hour

In [24]:
ride_dist_on_hour = pd.read_csv("RideDistributionOnDayTime/part-r-00000", index_col="hour", delim_whitespace=True , names =["hour", "count"])
ride_dist_on_hour.head()
Out[24]:
count
hour
0 1085
1 665
2 365
3 248
4 366
In [25]:
ride_dist_on_hour.plot.barh(figsize=(25,25), color='#0099ff')
Out[25]:
<matplotlib.axes._subplots.AxesSubplot at 0x2368144b240>

- Trip durations sorted groaup by startstation (Secondary sort)

CompositeKeyWritablePart4{stationName='"5 Corners Library"', tripduration=3359} CompositeKeyWritablePart4{stationName='"5 Corners Library"', tripduration=3590} CompositeKeyWritablePart4{stationName='"5 Corners Library"', tripduration=4218} CompositeKeyWritablePart4{stationName='"5 Corners Library"', tripduration=4446} CompositeKeyWritablePart4{stationName='"5 Corners Library"', tripduration=7437} CompositeKeyWritablePart4{stationName='"5 Corners Library"', tripduration=7458} CompositeKeyWritablePart4{stationName='"5 Corners Library"', tripduration=8786} CompositeKeyWritablePart4{stationName='"5 Corners Library"', tripduration=15434} CompositeKeyWritablePart4{stationName='"5 Corners Library"', tripduration=15472} CompositeKeyWritablePart4{stationName='"5 Corners Library"', tripduration=15492} CompositeKeyWritablePart4{stationName='"5 Corners Library"', tripduration=15668} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=103} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=109} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=130} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=131} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=136} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=142} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=156} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=158} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=163} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=164} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=165} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=166} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=169} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=170} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=171} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=173} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=175} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=179} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=199} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=199} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=203} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=206} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=207} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=207} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=209} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=222} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=223} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=224} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=228} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=233} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=233} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=241} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=251} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=254} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=255} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=258} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=264} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=269} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=271} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=276} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=280} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=280} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=282} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=284} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=286} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=289} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=289} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=290} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=292} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=294} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=296} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=296} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=296} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=298} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=298} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=299} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=300} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=301} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=301} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=301} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=302} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=303} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=303} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=303} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=304} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=304} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=304} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=304} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=304} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=305} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=305} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=305} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=306} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=307} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=308} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=308} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=309} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=310} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=310} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=310} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=310} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=310} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=311} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=313} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=313} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=313} CompositeKeyWritablePart4{stationName='"Astor Place"', tripduration=314}

Data analysis using Pig

- Average trip duration based on user type

usertype_avg_duration.png

In [26]:
matplotlib.pyplot.pie(x=[3209.65,504.38], autopct='%1.0f%%', labels=["Customer","Subscriber"], radius = 2)
Out[26]:
([<matplotlib.patches.Wedge at 0x23682e7cf28>,
  <matplotlib.patches.Wedge at 0x236836446a0>],
 [Text(-2.0027942693243954, 0.9103928354075295, 'Customer'),
  Text(2.002794354561516, -0.9103926478923925, 'Subscriber')],
 [Text(-1.0924332378133066, 0.49657791022228875, '86%'),
  Text(1.0924332843062812, -0.49657780794130496, '14%')])

- Average trip duration based on gender

gender_avg_duration.png

In [27]:
matplotlib.pyplot.pie(x=[3147.0121,541.8469,698.4854], autopct='%1.0f%%', labels=["unknown","male","female"], radius=2)
Out[27]:
([<matplotlib.patches.Wedge at 0x23683677d30>,
  <matplotlib.patches.Wedge at 0x23683680438>,
  <matplotlib.patches.Wedge at 0x23683680b38>],
 [Text(-1.3878671374359546, 1.7069929141110485, 'unknown'),
  Text(0.3992544348500846, -2.1634684874646406, 'male'),
  Text(1.9305169657373258, -1.0550375562037353, 'female')],
 [Text(-0.7570184386014297, 0.9310870440605717, '72%'),
  Text(0.2177751462818643, -1.1800737204352585, '12%'),
  Text(1.0530092540385412, -0.5754750306565828, '16%')])

Data analysis using Hive

- Ride distribution based on user type

usertype_ride_dist.png

In [28]:
matplotlib.pyplot.pie(x=[14312, 156156], autopct='%1.0f%%', labels=["Customer","Subscriber"], radius = 2)
Out[28]:
([<matplotlib.patches.Wedge at 0x2368153eb00>,
  <matplotlib.patches.Wedge at 0x236815552b0>],
 [Text(2.1239169159812756, 0.573565108778932, 'Customer'),
  Text(-2.1239169696823073, -0.5735649099234764, 'Subscriber')],
 [Text(1.1585001359897864, 0.3128536956975992, '8%'),
  Text(-1.1585001652812583, -0.3128535872309871, '92%')])

- Longest trip duration

maxtripduration.png

Longest trip duration is approx 20 hrs

- Longest trip duration to and from location (subquery)

logesttripjourney.png

- What is the busiest bike in NYC in 2019? How many times was it used?

busiestbike.png

busiestbikedetails.png

Busiest bike id is 29252 and was rode for 319 times

In [ ]: